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Lo  JY,  Komguth  PJ,  Floyd  CE  Jr.  Multi-Institution  Evaluation  of  BIRADS  Breast 
Cancer  Prediction  Model.  Radiology  209(P):271, 1998. 


Introduction 

Biopsy  is  considered  to  be  a  definitive  test  to  rule  out  breast  cancer  for  those 
patients  who  participate  in  breast  screening  examinations  and  whose 
mammograms  are  interpreted  as  having  suspicious  findings.  Excisional  biopsy  is 
a  sensitive  and  specific  test  for  breast  cancerjl].  If  the  cost  of  excisional  biopsy 
were  minimal,  this  would  be  an  ideal  test  for  breast  cancer  malignancy. 
Unfortunately,  the  cost  of  this  procedure  in  both  monetary  and  emotional  terms, 
is  significant[2,3,4].  Unfortunately,  to  achieve  a  high  sensitivity  for  detecting 
cancer,  many  women  with  mammographic  findings  due  to  benign  processes 
undergo  biopsy.  The  false  positive  rate  for  the  decision  to  biopsy  is  currently 
between  66%  and  90%.  The  goal  of  the  work  described  here  is  to  design  a 
decision  tool  to  support  the  decision  to  biopsy.  This  decision  aid  must  maintain 
the  current  high  detection  rate  for  true  cancers  while  accurately  ruling  out  some 
of  the  benign  cases  and  thus  avoiding  unnecessary  biopsies. 

The  problem  of  classifying  suspicious  mammographic  lesions  as  benign  or 
malignant  is  recognized  as  a  difficult  practice.  There  is  considerable  variation  in 
the  skill  with  which  the  task  is  achieved  even  within  the  group  of  radiologists 
who  specialize  on  this  task.  The  radiographic  manifestation  of  breast  cancer  is 
not  well  enough  understood  from  a  fundamental  scientific  basis  to  allow  an 
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accurate  theoretical  predictive  model  to  be  constructed  from  first  principals. 
There  is  no  accurate  deterministic  model  for  relating  mammographic  findings  to 
biopsy  outcomes  although  some  general  rules  are  accepted.  Examples  of  these 
rules  are  "Older  women  are  more  likely  to  develop  breast  cancer  than  young 
women."  "If  the  margin  of  a  mass  appears  well  circumscribed,  the  mass  is  likely 
to  be  benign."  Unfortunately,  the  sensitivity  and  specificity  of  those  rules  that 
are  generally  agreed  upon  is  not  sufficient  to  allow  a  strict  implementation  over 
the  full  range  of  cases  that  are  encountered  in  clinical  practice.  While  rule  based 
expert  system  s  have  enjoyed  success  in  some  medical  diagnostic  tasks,  and  there 
are  expert  mammographers  whose  diagnostic  performance  would  qualify  them 
as  experts,  the  construction  of  rule-ased  expert  systems  for  this  diagnostic  task 
has  met  with  limited  success.  This  is  quite  possibly  due  to  the  difficulty  of 
describing  the  logical  and  analytic  process  used  by  the  experts  in  a  form  that  can 
be  used  by  other  mammographers.  Atypical  difficulty  with  the  expert  system 
approach  is  the  description  and  encoding  of  the  input  data:  two  radiologists 
often  will  use  similar,  but  not  exactly  the  same  descriptions  for  a  given  lesion.  It 
is  often  difficult  to  overcome  instability  in  a  model  due  to  this  potential 
ambiguity  in  the  input  data.  Another  difficulty  for  strict  rule-based  systems  is 
that  the  descriptors  used  as  inputs  to  the  model  can  be  nonspecific:  two  lesions 
with  similar  descriptions  can  have  opposite  outcomes  at  biopsy.  These 
arguments  indicate  that  an  example-based  technique  would  be  more 
appropriate.  This  is  supported  by  realizing  that  radiologists  are  trained  by 
repeatedly  examining  sample  cases  with  known  outcomes  that  are  maintained 
in  a  medical  school's  teaching  files.  The  focus  of  this  research  has  been  in 
developing  and  evaluating  data  driven  models,  specifically  artificial  neural 
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networks  (ANN)  for  the  task  of  predicting  the  outcome  of  biopsy  given  the 
description  of  mammographic  lesions  as  inputs.  This  work  has  been  facilitated 
by  the  growing  acceptance  of  BI-RADS  as  a  standardized  lexicon  for 
mammographic  case  reporting. 

ANN  systems  for  the  prediction  task  have  been  constructed  and  evaluated  using 
a  growing  database  of  mammographic  cases  that  were  sent  to  biopsy  with  the 
results  known.  This  work  has  been  successful  and  has  resulted  in  xxx  peer- 
reviewed  publications,  xxx  invited  presentations,  xxx  competitave  presentations, 
and  has  served  to  foster  further  research  efforts  for  constructing  decision  aids  for 
the  diagnosis  of  breast  cancer  as  demonstrated  by  the  11  funded  grants  that  have 
been  awarded  since  the  start  of  this  project.  The  ideas  that  developed  into  these 
other  grants  were  realized  from  this  work. 

In  the  latter  part  of  this  research  grant,  the  investigative  emphasis  has  shifted 
from  algorithm  development  to  clinical  evaluation  reflecting  the  shift  from 
specific  aim  one  to  specific  aim  two.  After  conducting  several  preliminary 
sessions  with  mammographers  using  the  system  including  three  years  of 
presentation  of  a  live  computer  demonstration  version  at  the  Radiological 
Society  of  North  America  InfoRad  exhibit,  several  important  question  shave  been 
identified  regarding  the  user  interface  of  the  system.  The  first  is  the  question  of 
how  the  results  should  be  presented  to  the  mamographer.  An  informal  exit 
interview  with  mammographers  who  used  the  system  indicated  that  70% 
preferred  a  probabilistic  output  where  the  mammographer  would  be  given  a 
number  between  0  and  100  to  indicate  the  estimated  percent  probability  tha  tth 
ecase  in  question  was  malignant.  The  other  30%  of  he  users  did  not  want  a 
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probability,  they  wanted  a  hard  decision  to  biopsy  or  not  to  biopsy.  For  these 
clinicians,  the  system  threshold  would  be  set  to  some  value  and  the  binary  result 
would  be  presented.  All  users,  especially  those  preferring  the  hardwired  decision 
threshold,  desired  some  indication  of  the  certainty  with  which  the  ddecision  was 
presented.  Several  individuals  expressed  an  interest  in  being  presented  with 
"similar"  cases  from  which  the  neural  network  was  trained.  These  reasonable 
requests  initiated  the  ideas  that  lead  to  the  development  of  the  CBR.  In  an  effort 
to  provide  similar  "example  cases",  it  was  realized  that  cases  with  similar 
findings  would  generate  similar  ANN  outputs  even  though  these  would  not 
provide  a  complete  or  unique  subset.  A  case  findings  matching  algorithm  was 
implemented  using  a  relational  database  (MicroSoft  ACCESS  ™)  to  simplify  and 
speed  the  coding.  It  was  later  found  that  this  implementation  also  dramatically 
improved  the  speed  of  execution.  With  this  case  matching  tool,  different 
definitions  of  what  constituted  a  similar  case  could  be  investigated.  It  was  found 
that  depending  on  how  strict  the  definition  of  similarity,  the  existing  database 
could  provide  between  10  and  100  similar  cases  for  each  cases  investigated. 

While  beyond  the  scope  of  this  investigation,  given  these  cases  identifications,  it 
would  be  straightforward  to  present  digital  version  of  the  cases  on  a  monitor  to 
provide  partial  explanation  of  the  ANN  result.  It  was  natural  to  compute  the 
fraction  of  malignancies  to  total  cases  within  the  set  of  matched  cases.  With  the 
use  of  this  fraction  as  a  decision  variable,  a  predictive  tool  was  naturally 
implemented.  While  the  evolution  of  this  technique  proceeded  as  described 
above  from  an  effort  to  provide  explanation  to  the  mammographer  for  the 
recommendation  suggested  by  the  ANN,  it  was  soon  recognized  that  the 
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resulting  algorithm  was  an  instantiation  of  a  simple  case-based  reasoning 
system. 


A  preliminary  investigation  was  performed  to  better  understand  the 
relationship  of  findings  to  malignancy  within  the  framework  of  the  BI-RADS 
reporting  lexicon.  A  Case-Based  Reasoning  (CBR)  approach  was  selected  for  this 
study  since  we  wished  to  examine  the  cases  and  the  similarity  between  them.  In 
this  context,  a  CBR  was  developed  and  evaluated  by  its  ability  to  predict  the 
outcome  of  biopsy  from  mammographic  findings  reported  in  the  BI-RADS 
lexicon.  To  classify  a  given  test  case  as  benign  or  malignant,  CBR  was 
implemented  by  comparing  the  case  to  all  previous  cases,  selecting  those  cases 
with  were  similar  with  regards  to  their  findings  and  examining  the  outcomes  for 
those  similar  cases.  A  decision  variable  was  formed  as  the  "malignancy  ratio: 
computed  as  the  ratio  of  the  number  of  malignant  cases  to  the  total  number  of 
similar  or  "matched"  cases.  Performance  was  evaluated  by  generating  an  ROC 
curve  from  the  true  positive  fraction  and  the  false  positive  fraction  as  the 
threshold  was  applied  to  the  malignancy  ratio. 

The  system  is  implemented  as  follows.  The  mammograms  are  read  by  clinicians 
using  a  standard  reporting  lexicon  (BI-RADS™).  These  findings  are  compared  to 
a  database  of  findings  from  cases  with  known  outcomes  (from  biopsy).  The 
fraction  of  similar  cases  that  were  malignant  is  returned.  The  clinician  can  then 
consider  this  result  when  making  the  decision  regarding  biopsy. .  This 
malignancy  fraction  is  an  intuitive  measure  that  can  be  readily  included  in  the 
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medical  decision.  This  approach  is  intuitive.  The  CBR  answers  the  question  "Of 
all  cases  that  are  similar  to  this  one,  how  many  were  malignant  at  biopsy?"  This 
process  is  similar  to  that  followed  by  the  clinician  when  considering  the  same 
problem. 

Methods 

The  CBR  was  implemented  as  a  case  retrieval  engine  in  a  relational  database 
framework.  In  this  context,  it  functions  as  a  query  of  a  table  of  cases  and 
outcomes.  To  predict  the  outcome  for  a  new  test  case,  the  test  case  is  compared 
to  each  case  in  the  database  through  a  matching  rule.  The  prediction  is  the  ratio 
of  the  number  of  malignant  to  the  total  number  of  cases  that  match. 

The  components  of  the  system  include  the  case  encoding  and  the  matching 
rule.  The  cases  are  encoded  through  a  subset  of  the  categorical  BI-RADS^M 
image  findings  and  the  patients'  age.  For  the  initial  experiment,  similarity  is 
defined  as  an  exact  match  of  some  subset  of  the  findings.  The  database  has  been 
described  previously[5]  and  was  restricted  for  this  feasibility  study  to  the  first 
500  cases  since  the  properties  of  this  set  were  well  understood  and  numerous 
previous  studies  had  been  conducted  on  this  set..  Of  these  500,  232  of  the  cases 
described  masses,  192  cases  described  microcalcifications,  and  29  cases  described 
masses  and  microcalcifications  associated  that  were  associated  with  the  mass. 
The  remaining  47  cases  did  not  describe  either  a  mass  or  a  calcification  but  were 
architectural  distortions,  asymmetric  breast  density,  focal  asymmetric  density, 
and/ or  asymmetric  breast  tissue.  Malignant  outcomes  were  reported  for  174 
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(35%)  of  the  biopsies  while  326  (65%)  were  found  to  be  benign  resulting  in  a 
Positive  Predictive  Value  (PPV)  of  35%. 

The  input  features  were  restricted  to  those  that  had  been  found  to  have 
the  highest  independent  predictive  power  in  our  earlier  studies  and  included  the 
patient  age,  and ,  for  masses,  the  mass  margin,  mass  size,  mass  density,  and  mass 
shape,  while  for  calcifications  included  calcification  description,  calcification 
number,  calcification  distribution,  and  special  cases /associated  findings. 

Performance  of  the  predictive  system  was  evaluated  through  a  round- 
robin  technique  in  which:  a  test  case  is  selected  from  the  dataset,  the  database  is 
formed  from  all  of  the  other  remaining  cases,  all  of  these  remaining  cases  are 
compared  to  the  test  case  and  those  that  match  are  selected.  The  malignancey 
fraction  is  found  for  the  set  of  matching  cases.  The  testing  example  is  replaced  in 
the  set  and  another  is  removed,  the  resulting  system  is  evaluated  and  this  is 
repeated  until  all  examples  have  been  used  as  testing  cases.  A  threshold  is 
applied  to  the  set  of  malignancy  fractions  and  the  sensitivity  and  false  positive 
fraction  are  plotted  as  the  threshold  is  applied  at  each  value  of  the  malignancy 
fraction.  A  Receiver  Operating  Characteristic  ROC  curve  is  plotted  from  these 
ordered  pairs.  The  ROC  areas  were  computed  from  the  resulting  curve  using 
Newton's  method  of  integration.  Used  as  a  performance  measure,  ROC  area 
gives  equal  significance  to  the  sensitivity  and  specificity  resulting  from  the 
application  of  a  specific  threshold.  It  is  clear  that  sensitivity  has  higher  priority 
than  specificity  for  this  problem  since,  while  there  is  a  need  to  reduce  the  number 
of  benign  biopsies,  there  is  a  greater  cost  incurred  by  missing  a  malignancy  than 
by  performing  a  biopsy  on  a  benign  lesion.  As  a  consequence,  a  more 
appropriate  measure  could  be  formed  by  concentrating  on  the  performance  at 
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high  sensitivity.  To  concentrate  on  this  region,  three  other  measures  are 
presented:  the  partial  ROC  area  reported  for  sensitivity  greater  than  90% ,  and , 
the  specificity  at  sensitivities  of  98%  and  100%. 

Results 

In  these  terms,  the  performance  is  presented  in  table  1  and  described 
below.  From  a  previous  study  on  the  predictive  power  of  the  findings  using 
linear  discrimanent  analysis  (LDA)  the  following  six  findings  were  found  to 
contribute  significantly:  Age,  Mass  Margin,  Mass  Density,  Calcification 
Description,  Calcification  Distribution,  and  Associated  Findings.  Requiring  an 
exact  match  on  all  six  features  resulted  in  an  ROC  area  of  0.77  but  with  very  poor 
(<1%)  specificity  at  high  sensitivities  of  90%  and  higher. 

Table  1 


Performance  of  Case  Based  Reasoning 

Matching 

Rule 

ROC  Az 

Partial 

ROC  Az 

Specificity 
at  100% 
Sensitivity 

Specificity 
at  98% 
Sensitivity 

6  findings 

0.77 

0.016 

<0.01 

0.012 

Table  1  CBR  performance. 

The  histogram  for  the  malignant  (positive)  and  benign  (negative)  cases  as 
a  function  of  malignancy  fraction  is  shown  in  fig.  1.  The  striped  boxes  indicate 
the  negative  cases  while  the  solid  boxes  show  the  positive  cases. 
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Cases 


Histogram  of  CBR  outputs 
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Figure  1  Histogram  of  CBR  outputs. 


The  ROC  curve  is  shown  in  fig.2. 


ROC  for  CBR  matching  6  findings 

1 

0.8 

>>  0.6 

;> 

c 

CD 

CO  0.4 

0.2 

0 

0  0.2  0.4  0.6  0.8  1 

False  Positive  Fraction 

Figure  2  ROC  curve  for  CBR  with  exact  match  of  6  findings  and  age  within  5  years. 

Less  than  0.12  seconds  are  required  to  predict  the  malignancy  ratio  for  a 
new  case  with  the  system  running  in  a  non-optimized  ACCESS™  (Microsoft  Inc., 
Redmond  Washington)  database  language  on  a  Pentium  II  300Mhz  personal 
computer. 

Discussion 

The  simple  system  described  above  was  developed  as  a  natural  result  of 
our  efforts  to  develop  a  user  interface  for  the  clinical  evaluation  of  the  artificial 
neural  network  for  reducing  benign  biopsies.  As  described  here,  it  performs 
better  than  chance  but  poorer  than  the  performance  reported  for  radiologists  on 
these  data[5].  No  optimization  was  performed  to  refine  the  matching  rule.  Future 
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work  will  examine  other  matching  rules  and  will  compare  the  performance  to 

that  of  the  artificial  neural  networks. 
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